A human-human train timetable dialogue corpus

نویسندگان

  • Filip Jurcícek
  • Jirí Zahradil
  • Libor Jelínek
چکیده

This paper describes progress in a development of the humanhuman dialogue corpus. The corpus contains transcribed user’s phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler’s plans. The corpus is based on dialogues’s transcription of user’s inquiries that were previously collected for a train timetable information center. We enriched this transcription by dialogue act tags. The dialogue act tags comprehend abstract semantic annotation. The corpus comprises a recorded speech of both operators and users, orthographic transcription, normalized transcription, normalized transcription with named entities, and dialogue act tags with abstract semantic annotation. A combination of a dialogue act tagset and a abstract semantic annotation is proposed. A technique of dialogue act tagging and abstract semantic annotation is described and used.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Czech-Sign Speech Corpus for Semantic Based Machine Translation

This paper describes progress in a development of the human-human dialogue corpus for machine translation of spoken language. We have chosen a semantically annotated corpus of phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler plans. Corpus dialogue act tags incorporate abstract semantic meaning. We have enriched a part of th...

متن کامل

Use of Negative Examples in Training the HVS Semantic Model

This paper describes use of negative examples in training the HVS semantic model. We present a novel initialization of the lexical model using negative examples extracted automatically from a semantic corpus as well as description of an algorithm for extraction these examples. We evaluated the use of negative examples on a closed domain human-human train timetable dialogue corpus. We significan...

متن کامل

Robust dialogue-state dependent language modeling using leaving-one-out

The use of dialogue-state dependent language models in automatic inquiry systems can improve speech recognition and understanding if a reasonable prediction of the dialogue state is feasible. In this paper, the dialogue state is defined as the set of parameters which are contained in the system prompt. For each dialogue state a separate language model is constructed. In order to obtain robust l...

متن کامل

Multi-feature Error Detection in Spoken Dialogue Systems

The present paper evaluates the role selected features and feature combinations play for error detection in spoken dialogue systems. We investigate the relevance of various, readily available features extracted from a corpus of dialogues with a train timetable information system, using RIPPER, a rule-inducing machine learning algorithm. The learning task consists of the identification of commun...

متن کامل

Corpus-Based Information Presentation for a Spoken Public Transport Information System

The Alparon project aims to improve Vxos, Openbaar Vervoer Reisinformatie's (OVa) automated speech processing system for public transport information, by using a corpus-based approach. The shortcomings of the current system have been investigated, and a study is made of how dialogues in the OVR domain usually occur between a human operator and a client. While centering our attention on the pres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005